latent: an R library for Latent Variable Modeling

Marcos Jimenez, Mauricio Garnier-Villarreal, & Vithor Rosa Franco

2026-02-05

Latent Variable Modeling

A latent variable model is a way of connecting things we can measure directly (called observed or manifest variables) to hidden qualities we cannot measure directly (called latent variables). These models are used in many areas like biology, computer science, and social sciences. Latent variable models can involve either categorical or continuous observed and hidden variables, as below:

Latent variables Continuous (Manifest) Categorical (Manifest)
Continuous Factor Analysis Item Response Theory
Categorical Latent Profile Analysis Latent Class Analysis


The latent R package

  • Few arguments for a straightforward analysis
  • lavaan syntax for Structural Equation Models
  • Customizable models
  • Core functions written in C++ with the armadillo library
  • Parallelization of multiple random starts to address local maxima

High-performance computing

  • Core functions written in C++ with the armadillo library

  • Parallelization of multiple random starts to address local maxima

Latent Class Analysis

Latent class analysis (LCA) is an umbrella term that refers to a number of techniques for estimating unobserved group membership based on a parametric model of one or more observed indicators of group membership.

People belong to different groups (i.e., classes) that are not directly observable. There are two types of parameter:

  • The probability that a person belongs to a particular class \(k\): \(P(\theta_k)\).

  • The conditional probability (density) of a response to item \(j\) if a person belongs to the class \(k\): \(f(y_j\mid \theta_k)\).

The likelihood of a response pattern is given by \[ \ell \;=\; \sum_{k=1}^{K} P(\theta_k)\, \prod_{j=1}^J f(y_j\mid \theta_k). \]

Categorical indicators example

For this analysis, we will employ the gss82 example data set included in the latent package. Sourced from the poLCA package, this data comes from 1,202 respondents to the 1982 General Social Survey.

Model fitting:

Code
fit <- lca(data = gss82, nclasses = 3,
           item = rep("multinomial", ncol(gss82)))
fit
latent 0.1.0 converged after 139 iterations

  Estimator                                     Penalized-ML
  Optimization method                           lbfgs
  Number of model parameters                    20

  Number of observations                        1202

  Number of response patterns (include NA)      33

  Number of possible patterns                   35

  ------------------------------------------------------

Model Test User Model:
  Test statistic (L2)                           22.088
  Degrees of freedom                            15
  P-value (L2)                                  0.106

Categorical indicators example

Model fitting:

Code
fit <- lca(data = gss82, nclasses = 3,
           item = rep("multinomial", ncol(gss82)))
fit@loglik            # -2754.643
fit@penalized_loglik  # -2759.507
fit@Optim$iterations  # 66
fit@Optim$convergence # TRUE
fit@timing            # 0.0184812

Model information:

Code
# Plot model fit info:
fit

# Get fit indices:
getfit(fit)

# Inspect model objects:
latInspect(fit, what = "coefs", digits = 3)
latInspect(fit, what = "classes", digits = 3)
latInspect(fit, what = "profile", digits = 3)
latInspect(fit, what = "posterior", digits = 3)

# Get confidence intervals:
CI <- ci(fit, type = "standard",
         confidence = 0.95, digits = 2)
CI$table

Model Fit indices

The getfit function is used for extracting several fit indices for the model.

Code
## Get fit indices
getfit(fit)
        nclasses             npar             nobs           loglik 
           3.000           20.000         1202.000        -2754.643 
penalized_loglik               L2              dof           pvalue 
       -2759.507           22.088           15.000            0.106 
             AIC              BIC             AIC3             CAIC 
        5549.287         5651.122         5569.287         5671.122 
             KIC            SABIC              ICL             AICp 
        5572.287         5587.594        -6011.738         5559.013 
            BICp            AIC3p            CAICp             KICp 
        5660.848         5579.013         5680.848         5582.013 
          SABICp             ICLp       R2_entropy 
        5597.320        -6021.465            0.580 
attr(,"class")
[1] "getfit.llca"

Profile table

The profile table contains the model parameter estimates.

Code
latInspect(fit, what = "profile")
$class
Class1 Class2 Class3 
 0.617  0.179  0.204 

$item
$item$PURPOSE
              Class1 Class2 Class3
Good           0.891  0.159  0.916
Depends        0.052  0.222  0.071
Waste of time  0.057  0.619  0.014

$item$ACCURACY
            Class1 Class2 Class3
Mostly true  0.615  0.043  0.653
Not true     0.385  0.957  0.347

$item$UNDERSTA
          Class1 Class2 Class3
Good       0.996  0.753  0.324
Fair/Poor  0.004  0.247  0.676

$item$COOPERAT
            Class1 Class2 Class3
Interested   0.945  0.643  0.688
Cooperative  0.055  0.256  0.258
Impatient    0.000  0.101  0.054

Continuous indicators example

Model fitting:

Code
fit <- lca(data = empathy[, 1:6], nclasses = 4L,
           item = rep("gaussian", ncol(empathy[, 1:6])))

fit@loglik                # -1841.336
fit@penalized_loglik      # -1844.333
fit@Optim$iterations      # 49
fit@Optim$convergence     # TRUE
fit@timing                # 0.2064401

Continuous indicators example

Profile output:

Code
latInspect(fit, what = "profile", digits = 3)
$class
Class1 Class2 Class3 Class4 
 0.248  0.184  0.153  0.416 

$item
$item$ec1
      Class1 Class2 Class3 Class4
Means  2.458  2.249  3.007  2.694
Stds   0.458  0.466  0.532  0.451

$item$ec2
      Class1 Class2 Class3 Class4
Means  2.717  2.173  3.389  3.086
Stds   0.261  0.505  0.541  0.527

$item$ec3
      Class1 Class2 Class3 Class4
Means  2.735  2.338  3.666  3.275
Stds   0.291  0.459  0.561  0.389

$item$ec4
      Class1 Class2 Class3 Class4
Means  2.834  2.283  3.777  3.282
Stds   0.356  0.547  0.404  0.330

$item$ec5
      Class1 Class2 Class3 Class4
Means  3.041  2.700  4.151  3.472
Stds   0.279  0.365  0.335  0.322

$item$ec6
      Class1 Class2 Class3 Class4
Means  3.182  2.850  4.230  3.657
Stds   0.323  0.439  0.386  0.371

Mixed indicators example

Model fitting:

Code
fit <- lca(data = cancer[, 1:6], nclasses = 3L,
           item = c("gaussian", "gaussian",
                    "multinomial", "multinomial",
                    "gaussian", "gaussian"))
fit@loglik                # -5784.701
fit@penalized_loglik      # -5795.573
fit@Optim$iterations      # 111
fit@Optim$convergence     # TRUE
fit@timing                # 0.1927969

Two-step LCA analysis with covariates

  • Step 1, fit the measurement model without the covariates:
Code
# Measurement model:
fit0 <- lca(data = empathy[, 1:6], nclasses = 4L,
            item = rep("gaussian", ncol(empathy[, 1:6])))
  • Step 2, fit the model with covariates fixing the measurement part:
Code
fit <- lca(data = empathy[, 1:6],
           X = empathy[, 7:8],
           model = fit0,
           item = rep("gaussian", ncol(empathy[, 1:6])),
           nclasses = 4L)
fit@loglik                # -1806.426
fit@penalized_loglik      # -1809.614
fit@Optim$iterations      # 38
fit@Optim$convergence     # TRUE
fit@timing                # 0.1596549

Two-step LCA analysis with covariates

Profile output:

Code
latInspect(fit, what = "coefs", digits = 3)
            Class1 Class2 Class3 Class4
(Intercept)      0 -3.246 -8.572 -3.304
pt1              0  0.598  0.918  0.932
pt2              0  1.155  2.463  0.677
Code
plot(fit,
     type = "standard",
     what = "OR",
     effects = "coding",
     show_est_ci = TRUE,
     confidence = 0.95,
     intercept = FALSE,
     predictors = NULL,
     est_ci_header_cex = 0.8,
     cex_y = 0.8,
     mfrow = c(2, 2),
     xlim = c(0, 5))

Regularization

Introducing regularization in the model:

Code
penalties <- list(
  class = list(alpha  = 1), # Bayes Constant for class probabilities
  prob  = list(alpha  = 1), # Bayes Constant for item probabilities
  sd    = list(alpha  = 1), # Bayes Constant for item variances
  beta  = list(alpha  = 0,             # MAP regularization
               lambda = 0, power = 0)  # L-type regularization

)
Code
fit <- lca(data = empathy[, 1:6],
           X = empathy[, 1:6], 
           model = fit0,
           item = rep("gaussian", ncol(empathy[, 1:6])),
           nclasses = 4L, 
           penalties = penalties)

L-type regularization for predictors coefficients, \[ \lambda \; \| \beta \|_2^L, \] or MAP estimation with a gaussian prior and precision hyperparameter \(\alpha\),

\[ \beta_{std} \sim N(0, 1/\alpha). \]

Real example of regularization with MAP

Without regularization

With MAP regularization

Factor Analysis

Factor Analysis (FA) is a method that estimates the influence of \(K\) continuous latent variables on a set of \(J\) items.

The score in item \(j\) is a weighted sum of the \(K\) latent factors: \[ X_j = \sum_{k=1}^K \lambda_{jk}F_k + \epsilon_j. \]

Under some assumptions, the \(J\) regressions can be encoded in a model for the covariance matrix of the items:

\[ S = \Lambda \Psi \Lambda^\top + \Theta. \]

  • \(\Lambda\) is a \(J \times K\) matrix containing the regression coefficients.

  • \(\Psi\) is the correlation matrix between the \(K\) latent factors.

  • \(\Theta\) is the error covariance matrix.

Positive-definite constraints

In the factor model equation, \[ \Lambda \color{red}{\Psi} \Lambda^\top + \color{red}{\Theta}, \]

Latent correlations\(\color{red}{\Psi}\) and covariances \(\color{red}{\Theta}\) should be at least positive-semidefinite but…

Positive-definite constraints (lavaan fails)

Let’s force an instance where lavaan fails to converge to a proper solution.

Code
library(lavaan)
model <- 'visual  =~ x1 + x2 + x3
          textual =~ x4 + x5 + x6
          speed   =~ x7 + x8 + x9
          x1 ~~ x5
          x1 ~~ x4
          x4 ~~ x5
          x4 ~~ x6'
fit2 <- cfa(model, data = HolzingerSwineford1939,
            estimator = "ml", std.lv = TRUE, std.ov = TRUE)
inspect(fit2, what = "est")$theta      # Error covariances
       x1     x2     x3     x4     x5     x6     x7     x8     x9
x1  0.455                                                        
x2  0.000  0.805                                                 
x3  0.000  0.000  0.618                                          
x4  0.084  0.000  0.000  0.244                                   
x5  0.060  0.000  0.000  0.132  0.521                            
x6  0.000  0.000  0.000 -0.202  0.000 -0.087                     
x7  0.000  0.000  0.000  0.000  0.000  0.000  0.667              
x8  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.455       
x9  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.575
Code
det(inspect(fit2, what = "est")$theta) # Check the determinant
[1] -0.00118551

Positive-definite constraints

  • Parameterize the polychoric correlation matrix as a crossproduct: \[ \Sigma = X^\top X \]
  • Constraint X to be oblique: \[ X \in \mathbb{R}^{p\times p}: \text{diag}(X^\top X) = I \]

\[ \begin{bmatrix} \color{red}{0.04} & \color{blue}{1.00} & \color{green}{-0.40} \\ \color{red}{-0.95} & \color{blue}{-0.07} & \color{green}{-0.40} \\ \color{red}{-0.32} & \color{blue}{-0.04} & \color{green}{0.83} \end{bmatrix}^\top \begin{bmatrix} \color{red}{0.04} & \color{blue}{1.00} & \color{green}{-0.40} \\ \color{red}{-0.95} & \color{blue}{-0.07} & \color{green}{-0.40} \\ \color{red}{-0.32} & \color{blue}{-0.04} & \color{green}{0.83} \end{bmatrix} = \begin{bmatrix} 1.00 & 0.11 & 0.10 \\ 0.11 & 1.00 & -0.41 \\ 0.10 & -0.41 & 1.00 \end{bmatrix} \]

Positive-definite latent covariances

  • Parameterize latent covariances as crossproducts:

\[ \Psi = Y^\top Y \\ \Theta = U^\top U \]

  • Constraint Y and U to be orthoblique: \[ X \in \mathbb{R}^{p\times p}: X^\top X = \text{sparse matrix} \]

\[ \begin{bmatrix} \color{red}{0.08} & \color{blue}{1.76} & \color{green}{0.04} \\ \color{red}{-1.95} & \color{blue}{-0.12} & \color{green}{-0.69} \\ \color{red}{-0.67} & \color{blue}{-0.08} & \color{green}{2.02} \end{bmatrix}^\top \begin{bmatrix} \color{red}{0.08} & \color{blue}{1.76} & \color{green}{0.04} \\ \color{red}{-1.95} & \color{blue}{-0.12} & \color{green}{-0.69} \\ \color{red}{-0.67} & \color{blue}{-0.08} & \color{green}{2.02} \end{bmatrix} = \begin{bmatrix} 4.24 & 0.42 & 0.00 \\ 0.42 & 3.11 & 0.00 \\ 0.00 & 0.00 & 4.56 \end{bmatrix} \]

Positive-definite latent covariances

Code
model <- 'visual  =~ x1 + x2 + x3
          textual =~ x4 + x5 + x6
          speed   =~ x7 + x8 + x9
          x1 ~~ x5
          x1 ~~ x4
          x4 ~~ x5
          x4 ~~ x6'
fit <- lcfa(data = HolzingerSwineford1939, model = model,
            estimator = "ml", std.lv = TRUE, positive = TRUE,
            control = list(rstarts = 10))
round(latInspect(fit, what = "est")[[1]]$theta, 3)
      x1    x2    x3     x4    x5     x6    x7    x8    x9
x1 0.453 0.000 0.000  0.082 0.041  0.000 0.000 0.000 0.000
x2 0.000 0.805 0.000  0.000 0.000  0.000 0.000 0.000 0.000
x3 0.000 0.000 0.626  0.000 0.000  0.000 0.000 0.000 0.000
x4 0.082 0.000 0.000  0.336 0.124 -0.080 0.000 0.000 0.000
x5 0.041 0.000 0.000  0.124 0.440  0.000 0.000 0.000 0.000
x6 0.000 0.000 0.000 -0.080 0.000  0.077 0.000 0.000 0.000
x7 0.000 0.000 0.000  0.000 0.000  0.000 0.672 0.000 0.000
x8 0.000 0.000 0.000  0.000 0.000  0.000 0.000 0.462 0.000
x9 0.000 0.000 0.000  0.000 0.000  0.000 0.000 0.000 0.571
Code
det(latInspect(fit, what = "est")[[1]]$theta)
[1] 0.0002812675

Cooking new stuff

  • Standard errors for 2-step models

  • Expectation-Maximization algorithm

  • (Exploratory) Structural Equation Modeling

  • Hidden Markov Models

Release date? Soon

Download the beta version at https://github.com/Marcosjnez/latent

Contact: m.j.jimenezhenriquez@vu.nl